智能论文笔记

Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT

Aparna Elangovan , Yuan Li , Douglas E. V. Pires , Melissa J. Davis , Karin Verspoor

分类：机器学习 | 自然语言处理

2022-01-06

蛋白质 - 蛋白质相互作用（PPI）对正常细胞功能至关重要，并且与许多疾病途径有关。然而，只有4％的PPI用PTMS在诸如完整的生物知识数据库中的PTM，主要通过手动策策进行，这既不是时间也不是成本效益。我们使用完整的PPI数据库创建具有交互蛋白对，它们相应的PTM类型和来自PubMed数据库的相关摘要注释的远程监督数据集。我们训练Biobert Models的一组合 - 配音PPI-Biobert-X10，以提高置信度校准。我们利用集合平均置信度方法的使用，置信范围抵消了类别不平衡提取高信任预测的影响。在测试集上评估的PPI-BIOBERT-X10模型导致适用的F1-MICRO 41.3（P = 5 8.1，R = 32.1）。然而，通过结合高信心和低变化来识别高质量的预测，调整精度预测，我们保留了100％精度的19％的测试预测。我们评估了1800万PubMed摘要的PPI-Biobert-X10，提取了160万（546507个独特的PTM-PPI三联网）PTM-PPI预测，并过滤〜5700（4584个独一无二）的高信心预测。在5700中，对于小型随机采样的子集进行人体评估表明，尽管置信度校准，精度降至33.7％，并突出了即使在置信度校准的情况下超出了测试集中的最长途的挑战。我们仅包括与多个论文相关的预测的问题来规避问题，从而将精确提高到58.8％。在这项工作中，我们突出了深入学习的文本挖掘在实践中的利益和挑战，并且需要增加对置信校准的强调，以促进人类策划努力。

translated by 谷歌翻译

Heliophysics Discovery Tools for the 21st Century: Data Science and Machine Learning Structures and Recommendations for 2020-2050

R. M. McGranaghan , B. Thompson , E. Camporeale , J. Bortnik , M. Bobra , G. Lapenta , S. Wing , B. Poduval , S. Lotz , S. Murray

分类：人工智能 | 机器学习

2022-12-26

Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires a new approach to the representation of knowledge.

translated by 谷歌翻译

FAIR AI Models in High Energy Physics

Javier Duarte , Haoyang Li , Avik Roy , Ruike Zhu , E. A. Huerta , Daniel Diaz , Philip Harris , Raghav Kansal , Daniel S. Katz , Ishaan H. Kavoori

分类：机器学习

2022-12-09

The findable, accessible, interoperable, and reusable (FAIR) data principles have provided a framework for examining, evaluating, and improving how we share data with the aim of facilitating scientific discovery. Efforts have been made to generalize these principles to research software and other digital products. Artificial intelligence (AI) models -- algorithms that have been trained on data rather than explicitly programmed -- are an important target for this because of the ever-increasing pace with which AI is transforming scientific and engineering domains. In this paper, we propose a practical definition of FAIR principles for AI models and create a FAIR AI project template that promotes adherence to these principles. We demonstrate how to implement these principles using a concrete example from experimental high energy physics: a graph neural network for identifying Higgs bosons decaying to bottom quarks. We study the robustness of these FAIR AI models and their portability across hardware architectures and software frameworks, and report new insights on the interpretability of AI predictions by studying the interplay between FAIR datasets and AI models. Enabled by publishing FAIR AI models, these studies pave the way toward reliable and automated AI-driven scientific discovery.

translated by 谷歌翻译

Formulation of problems of combinatorial optimization for solving problems of management and planning of cloud production

M. V. Saramud , E. A. Spirin , E. P. Talay , I. I. Pikalov

分类：机器人

2022-12-05

The application of combinatorial optimization problems to solving the problems of planning processes for industries based on a fund of reconfigurable production resources is considered. The results of their solution by mixed integer programming methods are presented.

translated by 谷歌翻译

From fat droplets to floating forests: cross-domain transfer learning using a PatchGAN-based segmentation model

Kameswara Bharadwaj Mantha , Ramanakumar Sankar , Yuping Zheng , Lucy Fortson , Thomas Pengo , Douglas Mashek , Mark Sanders , Trace Christensen , Jeffrey Salisbury , Laura Trouille

分类：机器学习 | 计算机视觉

2022-11-08

Many scientific domains gather sufficient labels to train machine algorithms through human-in-the-loop techniques provided by the Zooniverse.org citizen science platform. As the range of projects, task types and data rates increase, acceleration of model training is of paramount concern to focus volunteer effort where most needed. The application of Transfer Learning (TL) between Zooniverse projects holds promise as a solution. However, understanding the effectiveness of TL approaches that pretrain on large-scale generic image sets vs. images with similar characteristics possibly from similar tasks is an open challenge. We apply a generative segmentation model on two Zooniverse project-based data sets: (1) to identify fat droplets in liver cells (FatChecker; FC) and (2) the identification of kelp beds in satellite images (Floating Forests; FF) through transfer learning from the first project. We compare and contrast its performance with a TL model based on the COCO image set, and subsequently with baseline counterparts. We find that both the FC and COCO TL models perform better than the baseline cases when using >75% of the original training sample size. The COCO-based TL model generally performs better than the FC-based one, likely due to its generalized features. Our investigations provide important insights into usage of TL approaches on multi-domain data hosted across different Zooniverse projects, enabling future projects to accelerate task completion.

translated by 谷歌翻译

A Constraint-Driven Approach to Line Flocking: The V Formation as an Energy-Saving Strategy

Logan E. Beaver , Christopher Kroninger , Michael Dorothy , Andreas A. Malikopoulos

分类：机器人

2022-09-23

在过去的二十年中，对机器人羊群的研究受到了极大的关注。在本文中，我们提出了一种约束驱动的控制算法，该算法可最大程度地减少单个试剂的能耗并产生新兴的V形成。随着代理之间的分散相互作用的形成出现，我们的方法对自发添加或将代理去除为系统是强大的。首先，我们提出了一个分析模型，用于在固定翼无人机后面的尾巴上洗涤，并得出了尾随无人机以最大化其旅行耐力的最佳空气速度。接下来，我们证明，简单地在最佳空速上飞行将永远不会导致新兴的羊群行为，并且我们提出了一种新的分散的“ Anseroid”行为，从而产生出现的V形成。我们用约束驱动的控制算法编码这些行为，该算法最小化每个无人机的机车能力。最后，我们证明，在我们提出的控制法律下，以近似V或eChelon形成初始化的无人机将融合，我们证明了这种出现在模拟和与Crazyflie四肢旋转机队的实验中实时发生。

translated by 谷歌翻译

Self-Supervised Clustering on Image-Subtracted Data with Deep-Embedded Self-Organizing Map

Y. -L. Mong , K. Ackley , T. L. Killestein , D. K. Galloway , M. Dyer , R. Cutter , M. J. I. Brown , J. Lyman , K. Ulaczyk , D. Steeghs

分类：计算机视觉

2022-09-14

开发有效的自动分类器将真实来源与工件分开，对于宽场光学调查的瞬时随访至关重要。在图像差异过程之后，从减法伪像的瞬态检测鉴定是此类分类器的关键步骤，称为真实 - 博格斯分类问题。我们将自我监督的机器学习模型，深入的自组织地图（DESOM）应用于这个“真实的模拟”分类问题。 DESOM结合了自动编码器和一个自组织图以执行聚类，以根据其维度降低的表示形式来区分真实和虚假的检测。我们使用32x32归一化检测缩略图作为底部的输入。我们展示了不同的模型训练方法，并发现我们的最佳DESOM分类器显示出6.6％的检测率，假阳性率为1.5％。 Desom提供了一种更细微的方法来微调决策边界，以确定与其他类型的分类器（例如在神经网络或决策树上构建的）结合使用时可能进行的实际检测。我们还讨论了DESOM及其局限性的其他潜在用法。

translated by 谷歌翻译

A Robust Scientific Machine Learning for Optimization: A Novel Robustness Theorem

Luana P. Queiroz , Carine M. Rebello , Erber A. Costa , Vinicius V. Santana , Alirio E. Rodrigues , Ana M. Ribeiro , Idelfonso B. R. Nogueira

分类：机器学习

2022-09-13

科学机器学习（SCIML）是对几个不同应用领域的兴趣越来越多的领域。在优化上下文中，基于SCIML的工具使得能够开发更有效的优化方法。但是，必须谨慎评估和执行实施优化的SCIML工具。这项工作提出了稳健性测试的推论，该测试通过表明其结果尊重通用近似值定理，从而确保了基于多物理的基于SCIML的优化的鲁棒性。该测试应用于一种新方法的框架，该方法在一系列基准测试中进行了评估，以说明其一致性。此外，将提出的方法论结果与可行优化的可行区域进行了比较，这需要更高的计算工作。因此，这项工作为保证在多目标优化中应用SCIML工具的稳健性测试提供了比存在的替代方案要低的计算努力。

translated by 谷歌翻译

A new Reinforcement Learning framework to discover natural flavor molecules

Luana P. Queiroz , Carine M. Rebello , Erbet A. Costa , Vinícius V. Santana , Bruno C. L. Rodrigues , Alírio E. Rodrigues , Ana M. Ribeiro , Idelfonso B. R. Nogueira

分类：机器学习

2022-09-13

味道是遵循社会趋势和行为的风味行业的焦点。新调味剂和分子的研究和开发在该领域至关重要。另一方面，自然风味的发展在现代社会中起着至关重要的作用。鉴于此，目前的工作提出了一个基于科学机器学习的新颖框架，以在风味工程和行业中解决新的问题。因此，这项工作带来了一种创新的方法来设计新的自然风味分子。评估了有关合成可及性，原子数以及与天然或伪天然产物的相似性的分子。

translated by 谷歌翻译

Graph Neural Networks for Low-Energy Event Classification & Reconstruction in IceCube

R. Abbasi , M. Ackermann , J. Adams , N. Aggarwal , J. A. Aguilar , M. Ahlers , M. Ahrens , J. M. Alameddine , A. A. Alves Jr. , N. M. Amin

分类：机器学习

2022-09-07

ICECUBE是一种用于检测1 GEV和1 PEV之间大气和天体中微子的光学传感器的立方公斤阵列，该阵列已部署1.45 km至2.45 km的南极的冰盖表面以下1.45 km至2.45 km。来自ICE探测器的事件的分类和重建在ICeCube数据分析中起着核心作用。重建和分类事件是一个挑战，这是由于探测器的几何形状，不均匀的散射和冰中光的吸收，并且低于100 GEV的光，每个事件产生的信号光子数量相对较少。为了应对这一挑战，可以将ICECUBE事件表示为点云图形，并将图形神经网络（GNN）作为分类和重建方法。 GNN能够将中微子事件与宇宙射线背景区分开，对不同的中微子事件类型进行分类，并重建沉积的能量，方向和相互作用顶点。基于仿真，我们提供了1-100 GEV能量范围的比较与当前ICECUBE分析中使用的当前最新最大似然技术，包括已知系统不确定性的影响。对于中微子事件分类，与当前的IceCube方法相比，GNN以固定的假阳性速率（FPR）提高了信号效率的18％。另外，GNN在固定信号效率下将FPR的降低超过8（低于半百分比）。对于能源，方向和相互作用顶点的重建，与当前最大似然技术相比，分辨率平均提高了13％-20％。当在GPU上运行时，GNN能够以几乎是2.7 kHz的中位数ICECUBE触发速率的速率处理ICECUBE事件，这打开了在在线搜索瞬态事件中使用低能量中微子的可能性。

translated by 谷歌翻译